本が到着順に並んでいない、代わりに ユニバーサルキーによって並べられている図書館を想像してみてください。これは、順次的から 関連コンテナへのパラダイムシフトです。代わりに ベクター を線形にスキャンするのではなく、 マップ または セット を使って対数時間の検索($O(\log n)$)を実現します。
1. 関連の本質
マップでは マップキーと値のペアを保存します。キーは文字列、カスタムオブジェクト、または キーと値のペア厳密な弱順序(Strict Weak Ordering)をサポートする任意の型として機能します。一方、 厳密な弱順序セットは、ユニークなキーのみを保持し、メンバーシップテストやフィルタリングに最適なツールです。 セット逆に、ユニークなキーのみを格納するため、メンバーシップテストやフィルタリングに最適です。
2. オーダード対アンオーダード
標準コンテナ(マップ、 セット)はキーをソート済みに保ちます。 C++11 標準 は、無順序のバリエーション(unordered_map)を導入しました。これらは ハッシュ関数 を使用して平均的に $O(1)$ のパフォーマンスを実現します。これらの高性能バケットを利用する際は、 C++11 ロゴ に注目してください。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What happens when you use the subscript operator
m[k] on a map if the key k does not exist?It throws an out_of_range exception.
It returns a null pointer.
It adds a new element with key k and a value-initialized value.
The program fails to compile.
✅ Correct!
Subscripting a map behaves differently from a vector; it automatically inserts missing keys.❌ Incorrect
Unlike at(), the [] operator is an insertion-on-demand mechanism.QUESTION 2
Which of the following calls is ILLEGAL for a multiset
c and a vector v?copy(v.begin(), v.end(), inserter(c, c.end()));copy(v.begin(), v.end(), back_inserter(c));copy(c.begin(), c.end(), back_inserter(v));copy(c.begin(), c.end(), inserter(v, v.end()));✅ Correct!
Correct. back_inserter requires push_back, which associative containers like multiset do not support.❌ Incorrect
Associative containers lack push_back, so back_inserter cannot be used with them.QUESTION 3
Identify the correct way to define a variable to hold the result of calling
find on a map<string, vector> >.vector<int> res = m.find("key");map<string, vector>::iterator it = m.find("key"); >pair<string, vector> p = m.find("key"); >bool found = m.find("key");✅ Correct!
find returns an iterator to the element if found, or m.end() otherwise.❌ Incorrect
The find method always returns an iterator, not the value or a boolean.QUESTION 4
What is the primary advantage of using a
set over a vector for storing a list of excluded words?A set uses less memory than a vector.
A set allows for duplicate excluded words.
A set provides O(log N) lookup instead of O(N) linear search.
A set allows elements to be accessed by index.
✅ Correct!
Using exclude.find(word) on a set is significantly more efficient than std::find on a vector for large datasets.❌ Incorrect
While a vector is simpler, its linear search ($O(N)$) becomes a bottleneck compared to a set's $O(\log N)$.QUESTION 5
In the expression
++word_count.insert({word, 0}).first->second;, what does first refer to?The key of the inserted element.
The boolean indicating if insertion was successful.
An iterator to the element with the given key.
The first element in the entire map.
✅ Correct!
map::insert returns a pair where first is an iterator to the element.❌ Incorrect
The return of insert is a pair<iterator, bool>. first is the iterator.Module Assessment: High-Performance Text Processing
Implementing the Text-Query Architecture
You are tasked with building a search engine component (Exercise 12.28). The system must read a file and allow users to query words, returning the line numbers where those words appear. You must decide on the most efficient container strategy without using custom classes initially.
Q
1. Design the data structure to hold the word-to-line-number mapping. Which combination is most efficient?
Solution:
A
A
map<string, set> > is the ideal choice. The map allows $O(\log N)$ lookup for the word (the key), and the set stores line numbers (values) uniquely and in ascending order, facilitating clean result output.Q
2. Write a code snippet to implement the word counting logic that ignores case and punctuation (Required: 100 words).
Solution:
To implement case-insensitive word counting:
To implement case-insensitive word counting:
for (char &c : word) c = tolower(c);
word.erase(remove_if(word.begin(), word.end(), ispunct), word.end());
++word_count[word];
This logic iterates through each character of the string, converting it to lowercase using tolower. Then, it utilizes the remove_if algorithm with ispunct to shift all punctuation marks to the end of the string, followed by erase to truncate them. Finally, it uses the map's subscript operator to increment the count. This ensures that 'Example!', 'example.', and 'Example' are all normalized to 'example' and counted under a single unique key, maintaining data integrity.Q
3. How would you modify the system to safely share this data with a 'QueryResult' object without duplicating the entire map?
Solution:
Use
Use
shared_ptr. Specifically, the QueryResult should hold a shared_ptr<set> >. By using QueryResult(sought, loc->second, file) where the second argument is a shared pointer to the line number set, the data remains valid even if the original TextQuery object's scope ends, while avoiding expensive deep copies.